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Description 

STORAGE MEDIUM INCLUDING TEXT-BASED 
CAPTION INFORMATION, REPRODUCING APPARATUS 

AND REPRODUCING METHOD THEREOF 

Technical Field 

[1] The present invention relates to reproduction of data on a storage medium, an4 

more particularly, to a storage medium containing text-based caption information 
compatible with the subpicture method of a digital versatile disc (DVD) and the pre- 
sentation method of a Blu-ray disc, and a reproducing apparatus and reproducing 
method thereof. 

Background Art 

[2] Among conventional caption technologies, there exists text-based caption 

technologies, which are mainly used in a personal computer (PC), and a subpicture- 
graphic-based caption technology, which is used in a DVD. 

[3] First, as examples of the conventional text-based caption technologies mainly used 

in a PC, there are Synchronized Accessible Media Interchange (S AMI) of Microsoft, 
and Real-text technology of RealNetworks. The conventional text-based caption 
technologies have a structure in which a caption is output on the basis of syn- 
chronization information in relation to a file in which video stream data is recorded or 
video stream data provided on a network. 

[4] FIG. 1 is a diagram illustrating the structure of a caption file used in a text-based 

caption technology mainly used in the conventional PC. 

[5] Referring to FIG. 1, there is a text-based caption file for video stream data, and a 

caption for video stream data is output on the basis of synchronization time in- 
formation, for example, <sync time 00:00>, contained in the caption file. An example 
of a caption file constructed assuming continuous reproduction of the video stream 
data is shown. 

[6] FIG. 2 is a diagram illustrating the structure of an apparatus reproducing the con- 

ventional text-based captions. 

[7] Referring to FIG. 2, a text caption file is read from a storage medium 200, stored in 

a text caption data and font data buffer 220, and then converted into bitmap image 
graphic ckta by a text caption decoder 222. By control of a graphic controller 224, the 
converted graphic data is output on the screen 232 overlapping video frame data from 
a video frame buffer 214 that has been decoded in a video decoder 212. 
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[8] However, as shown in FIG. 2, the conventional text-based caption file structure 

considers only synchronization time (<sync time=00:00>) by which a caption is 
displayed on the screen, and the type, size, and color of font when a caption is output 
on the screen, but does not consider how long a bitmap image is kept in a buffer after 
the bitmap is generated by decoding text caption data. Accordingly, there is a problem 
such that in a reproducing apparatus using a low speed processor, a caption cannot be 
output on the screen in real time as the conventional DVD reproducing apparatus 
reproduces data. 

[9] Meanwhile, the subpicture-graphic-based caption technology used in the con- 

ventional DVD will now be explained. 

[10] A DVD uses a bitmap image for a subtitle. Subtitle data of a bitmap image is 

losslessly encoded and recorded on a DVD. A maximum of 32 losslessly encoded 
bitmap images are recorded on a DVD. 

[11] FIG. 3 is a diagram illustrating the data structure of the conventional DVD 

explaining the structure of a caption file used in a subpicture-graphic-based caption 
technology used in the conventional DVD. 

[12] Referring to FIG. 3, in a DVD, the disc area is divided into a video manager (VMG) 

area and a plurality of video title set (VTS) areas. Title information and information on 
title menus is stored in the VMG area, and information on the title is stored in the 
plurality of VTS areas. The VMG area is formed with 2-3 files, and each of the VTS 
areas is formed with 3-12 files. The VMG area includes a VMGI area storing 
additional information on the VMG, a video object set (VOBS) area storing moving in- 
formation (video objects) on a menu, and a backup area (BUP) of the VMGI. These 
areas are stored as one file and among them the presence of the VOBS area is optional. 

[13] In a VTS area, information on a title that is a reproduction unit, and a VOBS having 

moving picture data is stored. In one VTS, at least one title is recorded. The VTS area 
includes video title set information (VTSI), a VOBS having moving picture data for a 
menu screen, a VOBS having moving picture data of a video title set, and backup data 
of the VTSI. The presence of the VOBS to display the menu screen is optional. Each 
VOBS is again divided into recording units of a VOB and Cells that are recording 
units. One VOB is formed with a plurality of Cells. The smallest recording unit 
mentioned in the present invention is the Cell. 

[14] FIG. 4 is a diagram illustrating a detailed structure of the VOBS having moving 

picture data in the data structure of the conventional DVD shown in FIG. 3. 

[15] Referring to FIG. 4, one VOBS is formed with a plurality of VOBs, and one VOB 
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is formed with a plurality of Cells. A Cell is again formed with a plurality of video 
object units (VOBUs). A VOBU is (fata encoded by a moving picture experts group 
(MPEG) method that is a moving picture coding method used in a DVD. According to 
the MPEG, since images are coded through spatiotemporal compression, a previous or 
succeeding image is required to decode a predetermined image. Accordingly, in order 
to support a random access function by which reproduction starts from an arbitrary 
position, intra coding that does not require a previous or succeeding image is 
performed in each predetermined interval. In the MPEG, this is referred to as an intra 
picture or I picture, and pictures between this I picture and the next I picture are 
referred to as a group of pictures (GOP). Usually, a GOP is formed with 12-15 
images. 

[16] Meanwhile, the MPEG defines system cocfing (ISO/IEC 138 18-1) to combine video 

data and audio data into one bitstream. The system coding defines two multiplexing 
methods: a program stream (PS) multiplexing method for optimization to generate one 
program and store in an information storage medium, and a transport stream (TS) mul- 
tiplexing method appropriate to generate a plurality of programs for transmission. The 
conventional DVD employs the PS cocfing method. 

[17] According to the PS cocfing methodi video data or audio data is divided into units 

referred to as a pack (PCK) and multiplexed through a time division method. Data 
other than video data and audio data defined by the MPEG is named as a private 
stream and also is contained in the PCKs such that the private stream can be 
multiplexed together with the video data and audio data. 

[18] A VOBU is formed with a plurality of packs (PCK). The first pack (PCK) among 

the plurality of packs (PCK) is a navigation pack (NV_PCK), and the remaining packs 
include video packs (V JPCKs), audio packs (AJPCKs), and subpicture packs 
(SP_PCKs). Video data contained in a video pack is formed with a plurality of GOPs. 

[19] The subpicture pack (SP_PCK) is used for 2-cfimensional graphic data and caption 

data. That is, in a DVD, caption data displayed overlapping a video image is encoded 
by the same method as for 2-dimensional graphic (fata. In the case of DVD, a separate 
encoding method to support multiple languages is not employed and each caption (fata 
is converted into graphic data and then processed and recorded by one encoding 
method. The graphic data for a caption is referred to as a subpicture. The subpicture is 
formed with subpicture units (SPUs). A subpicture unit corresponds to one sheet of 
graphic data. 

[20] FIG. 5 is a diagram illustrating the correlation of a subpicture pack (SP_PCK) and a 



WO 2005/031740 



4 



PCT/KR2004/002519 



subpicture unit (SPU) in the structure of the VOBS having moving picture data shown 
in FIG. 4. 

Referring to FIG. 5, one subpicture unit (SPU) includes a subpicture unit header 
(SPUH), pixel data (PXD), and a subpicture display control sequence table 
(SP_DCSQT). These are sequentially divided and recorded in subpicture packs 
(SP_PCK) each with a size of 2048 bytes. At this time, if the last data of the subpicture 
unit (SPU) cannot fill one subpicture pack (SP_PCK) fully, the remainder of the last 
subpicture pack (SP__PCK) is filled with padding data. As a result, one subpicture unit 
(SPU) is formed with a plurality of subpicture packs (SP_PCKS). 

Recorded in the subpicture unit header (SPUH) are the size of the entire subpicture 
unit (SPU) and the location from which the subpicture display control sequence table 
(SP_DCSQT) having display control information in the subpicture unit (SPU) starts. 
The pixel data (PXD) is coded data obtained by compression coding a subpicture. The 
pixel data (PXD) forming a subpicture can have four types of values, including 
background pattern pixel, emphasis pixel-1, and emphasis pixel-2. The values can be 
expressed by two bits, and have binary values, 00, 01, 10, and 11, respectively. Ac- 
cordingly, the subpicture can be regarded as a set of data formed with a plurality of 
lines and having four types of pixel values. Encoding is performed for each line. 

FIG. 6 is a diagram illustrating a run-length coding method among methods of 
encoding the subpicture unit shown in FIG. 5. 

Referring to FIG. 6, in the run-length coding method when one to three instances 
of an identical pixel data value continue, the number of the continued pixel (No_P) is 
expressed by 2 bits and after that, a 2-bit pixel data value (PD) is recorded. When 4 to 
15 instances of an identical pixel data value continue, the first 2 bits are recorded as 0s, 
4 bits are used to record the No_P, and 2 bits are used to record the PD. When 16 to 63 
instances of an identical pixel data value continue, the first 4 bits are recorded as 0s, 6 
bits are used to record the NoJP, and 2 bits are used to record the PD. When 64 to 255 
instances of an identical pixel data value continue, the first 6 bits are recorded as 0s, 8 
bits are used to record the No_P, and 2 bits are used to record the PD. When a run of 
identical pixel data values continues to the end of a line, the first 14 bits are recorded 
as 0s, and 2 bits are used to record PD. When encoding of one line is thus finished if 
byte-unit alignment is not achieve4 4 bits of 0s are recorded. The number of encoded 
data bits in one line cannot exceed 1440 bits. 

FIG. 7 is a diagram illustrating the data structure of the SP_DCSQT having output 
control information of pixel cfata (PXD) shown in FIG. 5. 
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[26] Referring to FIG. 7, the SPJ3CSQT contains output control information for 

outputting the pixel data (PXD) described above. The SP_DCSQT is formed with a 
plurality of subpicture display control sequences (SP_DCSQ). One SP_DCSQ is a set 
of output control commands (SD_DCCMDs) performed at one time, and is formed 
with an SPJDCSCLSTM incicating a starting time, an SP_NXT_DCSQJSA 
containing position information of the next SP_DCSQ, and a plurality of 
SP_DCCMDs. 

[27] The SP_DCCMD includes control information on how the pixel data (PXD) 

described above is combined with a video image and output, and includes color in- 
formation of the pixel data, transparency information (or contrast information) of the 
video data, information on an output starting time, and an output finishing time. 

[28] FTC 8 is a diagram illustrating the output result of a subpicture together with 

moving picture data according to the data structure described above. 

[29] Referring to FIG. 8, the pixel data itself is losslessly encode4 and information on a 

subpicture display area having an area where a subpicture is output in a video display 
area having a video image area, and information on an output starting time and 
finishing time are contained in the SPJDCSQT as output control information. 

[30] In a DVD, subpicture data for caption data of a maximum of 32 different languages 

can be multiplexed together with moving picture (fata and recorded. These languages 
are distinguished by a stream id provided by the MPEG system coding method and a 
sub stream id defined by the DVD. Accordingly, if a user selects one language, the 
subpicture unit (SPU) is extracted by taking only subpicture packs (SP_PCK) having 
the stream id and sub stream id corresponding to the language, and then, by decoding 
the subpicture unit (SPU), caption data is extracted an4 according to output control in- 
formation, the output is controlled. 
Disclosure of Invention 

Technical Problem 

[31] This caption technology based on the subpicture graphic formed with bitmap 

images used in the conventional DVD has the following problems. 

[32] First, if bitmap based caption data is multiplexed with moving picture data and 

recorded, when the moving picture data is encode*} the bit generation amount 
occupied by subpicture cfata should be considered in advance. That is, by converting 
the caption data into graphic data, the amount of data generated in each language is 
different and the entire amount is huge. Usually, encoding moving picture data is 
performed only once and by addition to the output, subpicture data for each language 
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is again multiplexed and a DVD appropriate to each region is manufactured. However, 
depending on the language, there occurs a case in which the amount of subpicture data 
is huge, and when the subpicture data is multiplexed with the moving picture data, the 
total generated bit amount exceeds the maximum limit. Also, since the subpicture data 
is multiplexed between each moving picture data unit, the starting position of each 
VOBU becomes different in each region. In a DVD, since the starting position of a 
VOBU is separately manage4 whenever a multiplexing process begins, this in- 
formation should also be updated. 

[33] Secondly, since the contents of each subpicture cannot be known, it cannot be used 

for a separate purpose such as outputting two languages at the same time, or outputting 
only caption data without moving picture data to use for language learning. 

[34] As described above, since the text-based caption technology used in a PC and the 

caption technology using subpicture graphics as in a DVD are designed differently, if 
text-based caption <kta information is applied to the DVD reproducing apparatus 
without change, such problems as difficulties in guaranteeing real time reproduction or 
managing a subpicture data buffer occur. 

Technical Solution 

[35] The present invention provides an information storage medium including text-based 

caption information to solve these and/or other problems of the text-based caption 
technology and the subpicture-graphic-based caption technology used in a DVD, and a 
reproducing apparatus and a reproducing method thereof. 

[36] Additional aspects and/or advantages of the invention will be set forth in part in the 

description which follows an4 in part, will be obvious from the description, or may be 
learned by practice of the invention. 

Advantageous Effects 

[37] According to the present invention as described above, an information storage 

medium including text-based caption information to alleviate the discussed and/or 
other problems of the text-based caption technology and the subpicture-graphic-based 
caption technology used in a DVD, and a reproducing apparatus and a reproducing 
method thereof are provided. 

[38] Accordingly, management of a buffer becomes convenient, and captions in more 

than two different languages can be output at the same time, or only captions can be 
output separately without moving picture information. In addition, real time re- 
production of captions controlled by hardware can be guaranteed. 

[39] Furthermore, since the amount of encoded data of the subtitle data according to the 
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present invention is relatively less than that of the conventional subpicture type caption 
data based on a bitmap image, address management of a VOBU is easier even when 
encoding is again performed in order to process multiple languages. 

Description of Drawings 

[40] These and/or other aspects and advantages of the invention will become apparent 

and more readily appreciated from the following description of the embodiments, 
taken in conjunction with the accompanying drawings of which: 

[41] FIG. 1 is a diagram illustrating the structure of a caption file used in a text-based 

caption technology used in the conventional personal computer (PC); 

[42] FIG. 2 is a diagram illustrating the structure of a reproducing apparatus re- 

producing the conventional text-based captions; 

[43] FIG. 3 is a diagram illustrating the data structure of the conventional DVD 

explaining the structure of a caption file used in a subpicture-graphic-based caption 
technology used in the conventional DVD; 

[44] FIG. 4 is a diagram illustrating a detailed structure of video object set (VOBS) 

having moving picture data in the data structure of the conventional DVD shown in 
FIG. 3; 

[45] FIG. 5 is a diagram illustrating the correlation of a subpicture pack (SPJPCK) and 

a subpicture unit (SPU) in the structure of the VOBS having moving picture data 
shown in FIG. 4; 

[46] FIG. 6 is a diagram illustrating a run-length coding method among methods of 

encoding the subpicture unit shown in FIG. 5; 
[47] FIG. 7 is a diagram illustrating the data structure of the SP_DCSQT having output 

control information of pixel data (PXD) shown in FIG. 5; 
[48] FIG. 8 is a diagram illustrating the output result of a subpicture together with 

moving picture data according to the cfata structure described above; 
[49] FIG. 9 is a block diagram of a reproducing apparatus processing a text caption 

according to an embodiment of the present invention; 
[50] FIG. 10 is a detailed block diagram of the reproducing apparatus shown in FIG. 9; 

[51] FIG. 1 1 A is an example of text cfata to generate pixel data according to an 

embodiment of the present invention; 
[52] FIG. 1 IB is an example of graphic control information to control real time display 

of a caption according to an embodiment of the present invention; 
[53] FIG. 12 is a diagram of an embodiment of subtide data according to the present 

invention using a subpicture data structure of a DVD; 
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[54] FIG. 13 is a diagram of an embodiment of subtitle data according to the present 

invention using a presentation data structure of a Blu-ray disc; 
[55] FIG. 14 is a diagram of an embodiment of subtitle data in a text format that can be 

applied to a DVD; 

[56] FIG. 15 is a diagram of an embodiment of subtitle data in a text format that can be 

applied to a Blu-ray disc; 
[57] FIG. 16 is a diagram illustrating the output result of caption data according to an 

embodiment of the present invention; and 
[58] FIG. 17 is a flowchart illustrating operations performed in a method of processing 

a text caption according to an embodiment of the present invention. 

Best Mode 

[59] According to an aspect of the present invention, there is provided a storage medium 

inducing: moving picture data; and subtitle data to be output as a graphic overlapping 
an image based on the moving picture data, wherein the subtitle data includes: text 
data to generate pixel data converted into a bitmap image; and control information to 
control the pixel data to be output in real time. 

[60] The text data may generate the pixel data to be converted into the bitmap image 

such that caption contents are output as the graphic overlapping the image. 

[61] The text data may further include style information to specify the style of the 

caption output as the graphic overlapping the image, wherein the style information 
may include at least one of a pixel data area, a background color, a starting point at 
which a first letter of text begins, line spacing information, an output direction, a type 
of a font, font color, and a character code. 

[62] The control information may include time information indicating a time at which 

the pixel data is generated in a buffer memory and a time at which the pixel data is 
deleted in the buffer memory, and position information recording a position at which 
the pixel data is output. 

[63] The subtitle data may include the text data corresponding to pixel data (PXD) 

contained in subpicture information and the control information corresponding to 
display control information (SP_DCSQT). The subtitle data may be in a text format or 
a packet format. 

[64] The subtitle data may include the text data corresponding to a presentation 

composition segment (PCS) contained in presentation data, and the control information 
corresponding to an object definition segment (ODS). The subtitle data may be in a 
text format or in a packet format. 



WO 2005/031740 PCT/KR2004/002519 



[65] According to another aspect of the present invention, there is provided an apparatus 

to reproduce information from a storage medium induing moving picture data and 
subtitle cfata to be output as a graphic overlapping on an image based on the moving 
picture data, the apparatus inducing: a text caption decoder to decode text data 
contained in the subtitle data and generate pixel (fata converted into a bitmap image, 
and decode and parse control information contained in the subtitle data to control a 
caption to be output in real time; and a graphic controller to control the pixel data to be 
output in real time using the control information. 

[66] The text caption decoder may include: a text caption parser to decode and parse the 

text data and the control information; and a font renderer to convert the parsed text 
data into a bitmap image so that the parsed text is output as the graphic overlapping the 
image. 

[67] The text caption parser may decode and parse style information from the text data 

and specify an output style of the caption, and the font renderer may convert the text 
data into the bitmap image reflecting the parsed style information. 

[68] The text caption parser may parse the text data and transfer the parsed text data to 

the font renderer. The text caption parser may parse time information indicating a time 
at which the pixel data is generated in a buffer memory and a time at which the pixel 
data is deleted in the buffer memory, and position information recording a position at 
which the pixel data is output, from the control information, and transfer the parsed in- 
formation to the graphic controller, and the graphic controller may control the pixel 
data to be output in real time by using the parsed time information and position in- 
formation. 

[69] The subtitle data may include the text data corresponding to pixel data contained in 

subpicture information of a DVD formed by a bitmap image reproducing method and 
the control information corresponding to display control information (SP_DCSQT). 
The text caption parser may transfer the text data to the font renderer, and the control 
information to the graphic controller, and the graphic controller may control the pixel 
data (PXD) to be output in real time by using the transferred control information. 

[70] The subtitle data may include the text data corresponding to a PCS contained in 

presentation data of a Blu-ray disc formed by a bitmap image reproducing method and 
the control information corresponding to an ODS. The text caption parser may transfer 
the text data to the font renderer, and the control information to the graphic controller, 
and the graphic controller may control the pixel data to be output in real time by using 
the transferred control information. 
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[71] According to still another aspect of the present invention, there is provided a 

method of reproducing information from a storage medium including moving picture 
data and subtitle data to be output as a graphic overlapping on an image based on the 
moving picture data, the method including: reading the subtitle data induing text data 
and control information from the storage medium; decoding the text data, parsing 
caption contents and output style information, and converting the caption contents into 
pixel data formed as a bitmap image based on the parsed style information; decoding 
the control information, parsing time information to control the pixel data to be output 
in real time, and parsing position information to control a position at which a caption is 
output; and outputting the converted pixel data in real time according to the parsed 
time information and position information. 

Mode for Invention 

[72] Reference will now be made in detail to the embodiments of the present invention, 

examples of which are illustrated in the accompanying drawings, wherein like 
reference numerals refer to the like elements throughout. The embodiments are 
described below to explain the present invention by referring to the figures. 

[73] Referring to FIG. 9, the reproducing apparatus processing a text-based caption 

according to an embodiment of the present invention includes buffer units 902 and 
906, a video data processing unit 910, a text caption data processing unit 920, an audio 
data processing unit 930, and a blender 940. 

[74] According to the types of data to be stored the buffer units 902 and 906 include an 

AV data buffer 902 storing moving picture data, and a text caption data and font data 
buffer 906 storing text-based caption data. Data read from a variety of storage media 
900, induing a removable storage medium such as an optical disc, a local storage, 
and storages on the Internet, is temporarily stored in each buffer according to the type 
of the data. 

[75] The video data processing unit 910 includes a video decoder 914 and a video frame 

buffer 916. The video decoder 914 receives compression coded moving picture data 
from the AV data buffer 902 and decodes the data. The decoded video data is output to 
the screen 942 through the video frame buffer 916. 

[76] The text caption data processing unit 920 includes a text caption decoder 922, a 

subpicture decoder 924, and a graphic controller 926. The reproducing apparatus 
according to the present invention has the subpicture decoder 924 to process a subtitle 
in the conventional multiplexed subpicture type, and, in addition, has the text caption 
decoder 922 so that text-based caption data according to an embodiment of the present 
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invention can be processed. The text caption decoder 922 decodes text data to generate 
a bitmap image for a caption and control information to control real time reproduction 
of a caption among subtitle data, and generates pixel data converted into a bitmap 
image. The control information among the decoded (fata is transferred to the graphic 
controller 926 such that the generated pixel data is controlled to be output in real time. 

[77] The audio data processing unit 930 has an audio decoder to decode audio data so 

that the audio data is decoded and output through a speaker 932. 

[78] The blender 940 superimposes a bitmap image obtained by rendering caption data 

on video data obtained by decoding moving picture data, and outputs the data to the 
screen 942. 

[79] FIG. 10 is a detailed block diagram of the reproducing apparatus to process a text- 

based caption shown in FIG. 9. 

[80] Referring to FIG. 10, the structure of the text caption data processing unit 920 il- 

lustrated in FIG. 9 is shown in detail. 

[81] The reproducing apparatus according to this embedment of the present invention 

includes buffer units 1010, 1020, 1030, and 1040, a video data processing unit 910, a 
text caption data processing unit 920, and a blender 1039. Explanation of the audio 
processing unit described in FIG. 9 will be omitted. 

[82] The buffer units 1010, 1020, 1030, and 1040 include a video data buffer 1010, a 

subpicture data buffer 1020, a text caption data buffer 1030, and a font data buffer 
1040. Moving picture data and subtitle data are read from a variety of storage media 
1000, induing a removable storage medium such as an optical disc, a local storage, 
and storages on the Internet, and, according to the type of data, stored in respective 
buffers temporarily. The moving picture data (AV data) is de-multiplexed ancj 
temporarily, video data is stored in the video data buffer 1010, subpicture data for a 
subtitle is stored in the subpicture data buffer 1020, and audio data is stored in the 
audio (fata buffer (not shown). Meanwhile, text data to generate pixel data and control 
information to control a caption to be output in real time as subtitle data for a text- 
based caption are temporarily stored in the text caption (fata buffer 1030, and font data 
for a subtitle is temporarily stored in the font data buffer 1040. The video data 
processing unit 910 includes a video decoder 1012 and a video frame buffer 1014, and 
is the same as explained with reference to FIG. 9. 

[83] The text caption data processing unit 920 includes a text caption parser 1031, a font 

renderer 1034, a subpicture decoder 1033, a graphic controller 1038, a variety of bu 
ffers 1032, 1035, and 1036, and a color lookup table (CLUT) 1037. 
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[84] The text caption parser 1031 decodes and parses text data and control information 

included in subtitle data. Also, it decodes and parses style information specifying an 
output style of a caption further included in text data. The parsed text data is 
transferred to the font Tenderer 1034 along path 2. 

[85] The font Tenderer 1034 generates a bitmap image so that the parsed text data can be 

output as a graphic for overlapping. At this time, by reflecting the parsed style in- 
formation, a bitmap image is generated and the generated graphic data is temporarily 
stored in the pixel (fata buffer 1035 along path 3. The data structure of the text data and 
style information will be explained later. 

[86] The subpicture decoder 1033 decodes subpicture data for a subtitle de-multiplexed 

from the moving picture data. This is provided for compatibility with caption data of 
the conventional DVD subpicture method. According to another embodiment of the 
present invention, when text-based subtitle data according to the present invention is 
packetized and included in moving picture data, text data and control information are 
de-multiplexed and transferred to the text caption parser 1031 along path 9. 

[87] The graphic controller 1038 controls the caption to be output in real time by using 

control information. In the case of the conventional text-based caption technology such 
as S AMI of Microsoft described above, only a time for a caption to be output is 
specifie4 and therefore, if the caption is reproduced in a hardware device using a low 
speed processor, real time reproduction, in which moving picture data and caption data 
are synchronized and output, may not be guaranteed. 

[88] However, in the case of the reproducing apparatus according to the present 

invention, time information regarding when pixel data, which is converted into a 
bitmap image in the control information described above, is generated and deleted in 
the buffer memory, and position information regarding a position from which the pixel 
data is output, are parsed and the output of the pixel data buffer is controlled. By doing 
so, moving picture data and captions can be synchronized and reproduced in real time. 

[89] The variety of buffers 1032, 1035, and 1036 include a graphic control information 

buffer 1032, a pixel data buffer 1035, and a subpicture frame buffer 1036. 

[90] The graphic control information buffer 1032 temporarily stores control information 

parsed in the text caption parser 1031, and the pixel data buffer 1035 temporarily 
stores graphic data converted into a bitmap image. 

[91] The subpicture frame buffer 1036 temporarily stores pixel data so that the 

subpicture for a caption can be output by controlling the output of the pixel data 
according to the time information that is included in the control information from the 
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graphic controller 1038. 

The color lookup table (CLUT) 1037 controls the color of a caption to be output by 
using palette information included in control information. 

The blender 1039 superimposes the graphic image of a caption output from the text 
caption data processing unit 920 on an image output from the video data processing 
unit 910, and outputs the overlapping images on the screen 104L 

The operation of each block of the reproducing apparatus according to the 
embodiment of the present invention illustrated in FIG. 10 and described above can be 
summarized as follows. 

First, moving picture data read from the storage mecfium 1000 is de-multiplexe4 
and the video data is decoded by the video decoder 1012 after passing through the 
video data buffer 1010. After being output through the video frame buffer 1014, the 
decoded video data is output together with the graphic data of a caption output from 
the text caption processing unit 920, with the graphic data overlapping the video data. 
Audio data in the moving picture data is decoded by the audio decoder of the audio 
data processing unit 930 and output through the speaker 932. 

Meanwhile, text-based subtitle data according to this embodiment of the present 
invention which is read from the storage medium 1000 is parsed into text data and 
control information in the text caption parser 1031 after passing through the text 
caption data buffer 1030. The parsed text data is transferred to the font Tenderer 1034 
along path 2. Here, the text data is converted into graphic data in which caption 
contents are formed as a bitmap image, and the graphic data is stored in the pixel data 
buffer 1035. 

Meanwhile, control information, parsed into time information to output a caption in 
real time and output position information of the caption, is transferred through the 
graphic control information buffer 1032 along path 1 to the graphic controller 1038 
along path 7. The graphic controller 1038 adjusts the output speed of the graphic data 
stored in the pixel data buffer 1035 by using control information, outputs the graphic 
data to the subpicture frame buffer 1036, an4 by referring to the color lookup table 
1037, reflects color. The graphic controller 1038 superimposes the graphic data on the 
moving picture data through the blender 1039 and outputs the data to the screen. 

Meanwhile, as another embodiment of the present invention, when text-based 
subtitle data is packetized and multiplexed with moving picture data, subtitle data is 
decoded by the subpicture decoder 1033 and transferred to the text caption parser 1031 
along path 9. The processing of the subtitle data thereafter is the same as described 
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above. 

[99] As an embodiment of the present invention, when subtitle data includes the text 

data corresponding to pixel data (PXD) among subpicture information of a DVD 
formed by a bitmap data reproduction method and the control information cor- 
responding to display control information (SPJDCSQT), the subtitle data decoded by 
the subpicture decoder 1033 is transferred to the text caption parser 1031, and here, 
text data is transferred to the font Tenderer 1034 and control information is transferred 
to the graphic controller 1038 such that, by using the control information transferred to 
the graphic controller 1038, a caption is controlled to be output in real time. 

[100] As another embodiment of the present invention, when subtitle data includes the 
text (fata corresponding to a presentation composition segment (PCS) among pre- 
sentation data of a Blu-ray disc formed by a bitmap data reproduction method and the 
control information corresponding to an object definition segment (ODS), the subtitle 
data decoded by the subpicture decoder 1033 is transferred to the graphic controller 
1038 such that, by using the control information transferred to the graphic controller 
1038, a caption is controlled to be output in real time. 

[101] A storage medium on which text-based subtitle (fata according to an embodiment of 
the present invention is recorded will now be described. 

[102] The storage medium according to this embedment of the present invention includes 
moving picture data and subtitle (fata that is output as a graphic overlapping an image 
based on the moving picture. The subtitle data includes text data to generate pixel data 
and control information to control a caption to be output in real time. 

[103] Text data is utilized to convert caption contents into a bitmap image to be output as 
a graphic for overlapping. Text data further includes style information specifying the 
style of a font. Preferably, though not necessarily, the style information includes at 
least one of a pixel data area, a background color, the starting point at which the first 
letter of text begins, line spacing information, an output (Erection, the type of a font, 
font color, and a character code. 

[104] Meanwhile, the control information includes time information regarding when the 
pixel data obtained by rendering text data is generated and deleted in the buffer 
memory, and position information regarding a position at which pixel data is output. 

[105] As an embedment of a storage medium according to the present invention, subtitle 
data may include text data corresponding to pixel (fata (PXD) among subpicture in- 
formation, and control information corresponding to display control information (S 
P_DCSQT) such that predetermined contents similar to the subpicture information of a 
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DVD formed by a bitmap data reproduction method can be included. Subtitle data may 
be implemented in a text format or may be implemented as data in the form of packets. 

[106] Also, as another embodiment of a storage medium according to the present 

invention, subtitle data may include text data corresponding to a PCS among pre- 
sentation data, and control information corresponding to an ODS such that pre- 
determined contents similar to presentation (fata of a Blu-ray disc formed by a bitmap 
data reproduction method can be included. Subtitle data may be implemented in a text 
format or may be implemented as data in the form of packets. 

[107] FIG. 1 1 A is an example of text data to generate pixel data according to an 
embodiment of the present invention. 

[108] Referring to FIG. 1 1 A, in a text data area, text information includes caption 
contents and style information required to generate a bitmap image of pixel data. 

[109] That is, text information includes, for example, the contents of a caption to be 

output and style information specifying the output style of the caption. As style in- 
formation, when multiple lines of text are output, information on line spacing is 
included, and information indicating the output direction of text (left->right, right-> 
left, up->down) can be included. Also, information on a font, such as the size of text, 
bold, Italic, and underline, is include4 and information on line change to render text to 
begin from the next line, and information on the color of text can be included. In 
addition, character code information for encoding can be include4 for example, in- 
formation on whether a character code to be used is 8859-1 or UTF-16 can be included 

[110] This text information is an example according to this embodiment of the present 
invention and can be modified and implemented to fit the characteristic of a medium, 
such as a DVD and a Blu-ray disc, to which the present invention is applied. 

[Ill] FIG. 1 IB is an example of graphic control information to control real time display 
of a caption according to an embodiment of the present invention. 

[112] Referring to FIG. 1 IB, control information to control output of thepixel data 
converted into a bitmap image is shown. 

[113] That is, in order to indicate the size of the pixel data area in which the text data is 
converted into a bitmap image and rendered information on the width and height of 
the pixel cfata area can be recorded. Also, information on the color of the background 
of the pixel data, time information regarding when the pixel data is generated and 
deleted in the pixel cfata buffer memory, and starting point information indicating a 
position at which the first line of text characters begin can be recorded. These data 
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items are included in subtitle (fata as control information, and play the role of 
controlling a caption to be output in real time. 

[1 14] Also, when control data is applied to a Blu-ray disc, in order to collect and output a 
plurality of pixel data items in one screen, construction information collecting a 
plurality of data areas into one page can also be included. A color lookup table 
inducing information to be used for the background color and foreground color of 
caption text used in the page can be included. Since a specified area among pixel data 
information is output on the screen, area specifying information ((Xs, Ys), the width 
and height information of a pixel data area, or information on starting point (Xs, Ys) 
and end point (Xe, Ye)) can be included. Also, starting point information in a pixel 
data area corresponding to the first starting point of a subpicture display area explained 
with reference to FIG. 8 can also be included. Meanwhile, preferably, though not 
necessarily, time information is included which indicates a time when pixel data 
temporarily stored in a buffer is output, and a time when the pixel data is deleted. 

[115] This control information is but one example according to an embodiment of the 
present invention, and can be modified and implemented to fit the characteristic of a 
medium, such as a DVD and a Blu-ray disc, to which the present invention is applied. 

[116] FIG. 12 is a diagram of an embodiment of subtitle data according to the present 
invention using a subpicture data structure of a DVD. 

[117] Referring to FIG. 12, subtitle data according to this embodiment of the present 

invention can be implemented in a packet format of the MPEG method that is the con- 
struction method of a subpicture data stream of a DVD. That is, in a packetized 
element stream (PES) structure, in addition to a SPUH having header information, text 
caption data according to this embodiment of the present invention can be made to be 
recorded in a PXD area for pixel (fata, and control information according to this 
embodiment of the present invention can be made to be recorded in an SP_DCSQT 
area for output control information. Obviously, subtitle cfeta according to this 
embedment of the present invention can be implemented as binary data in the form of 
a packet, and can also be implemented in a text format induing contents similar to the 
subpicture data stream described above. Any data in a text format or in a binary format 
can be parsed by the text caption parser 1031 described with reference to FIG. 10. 
Parsed text data is transferred to the font Tenderer 1034 along path 2, and control in- 
formation is transferred to the graphic controller 1038 along path 1 such that based on 
the control information, a caption converted into a bitmap image can be output in real 
time. 
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[118] FIG. 1 3 is a diagram of an embodiment of subtitle data according to the present 
invention using a presentation data structure of a Blu-ray disc. 

[119] Referring to FIG. 13, subtitle data according to this embodiment of the present 

invention can be implemented in a packet format of the MPEG method that is the con- 
struction method of a presentation (fata stream of a Blu-ray disc. That is, in a PES 
structure, control information can be recorded to correspond to a PCS area and text 
caption data can be recorded to correspond to an ODS. In adition, a palette definition 
segment (PDS) and an end segment (END) can be further included. Obviously, subtitle 
data according to this embodiment of the present invention can be implemented as 
binary data in the form of a packet, and can also be implemented in a text format 
induing contents similar to the presentation data stream described above. 

[120] Any data in a text format or in a binary format can be parsed by the text caption 
parser 1031 described with reference to FIG. 10. Parsed text data is transferred to the 
font renderer 1034 along path 2, and control information is transferred to the graphic 
controller 1038 along path 1 such that based on the control information, a caption 
converted into a bitmap image can be output in real time. 

[121] FIGS. 14 and 15 illustrate examples of embodiments of subtitle data implemented 
in a text format. In particular, FIG. 14 illustrates an example of an embedment of 
subtitle data in a text format that can be applied to a DVD, and the subtitle data 
includes text and control information. Also, FIG. 15 illustrates an example of an 
embodiment of subtitle data in a text format that can be applied to a Blu-ray disc, and 
the subtitle data includes text (fata and control information and can further include 
color information. FIGS. 14 and 15 are just examples of the data structure of a storage 
medium according to embodiments of the present invention, and the data structure can 
be modified and implemented in a variety of ways. 

[122] In order to specify the style of subtitle data according to the embodiment of the 
present invention described above, the following character strings can be used: 

[123] \cn]\: specifies a color to be used in text. The basic value is 0. 

[124] \bn]\: specifies a background color to be used for the background of text. This 
should be used at the front of a character string, and the basic value is 0. 

[125] \f[n]\: specifies the type of font to be used in text. The basic value is 0. 

[126] \s[n]\: specifies the size of font to be used in text. The unit is a pixel and the basic 
value is 0. 

[127] \e[n]\: specifies a character code to be used for encoding text. The encoding method 
can be changed. If 0, ISO-8859-1 is use4 and if 1, UTF-16 is use4 and the basic value 
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is 0. 

[128] \o[n]\: specifies the position of a starting point from which text is rendered in a 
pixel data area. 

[129] M[n]\: specifies a line space when a line change for a text character string is 
performed. The unit for n is a pixel and the basic value is 0. 

[130] \d[n]\: specifies the output direction of text. If n is 0, text is output from left to right 
in the horizontal direction, and if n is 1, text is output from right to left in the 
horizontal direction. If n is 2, text is output in the vertical (Erection, and if there is a 
line change, the line change is performed from right to left. If n is 3, text is output in 
the vertical (Erection, and if there is a line change, the line change is performed from 
left to right. The basic value is 0. 

[131] \b[n]\: selects the size of a text character as bold or normal. Bold is 1 and normal is 
0, and the basic value is 0. 

[132] \i[n]\: selects the shape of a text character as Italic or normal. Italic is 1 and normal 
is 0, and the basic value is 0. 

[133] \u[n]\: specifies whether or not to underline a text character. To underline is 1 and 
no underline is 0, and the basic value is 0. 

[134] \n\: performs line change. The basic value is 0. 

[135] \\:\ outputs a character. The basic value is 0. 

[136] FIG. 16 is a diagram illustrating the output result of caption data according to an 
embodiment of the present invention. 

[137] Referring to FIG. 16, for example, when the following character string is used as 
style information, the output result on the screen is shown. That is, when style in- 
formation, \o2000\ \b0\ \cl\ \f0\ M20\Hello, \bl\Subtitle\bO\ \il\ \n\World, is use4 the 
output result of pixel data generated by parsing this information is shown. 

[138] For information regarding a font used in text data, font information recorded 

separately from subtitle data is received from a disc or a network, and related font in- 
formation is stored in a font buffer memory such that the font information can be used. 

[139] A method of processing a text caption based on the structures of the reproducing 
apparatus and the storage medium described above will now be explained. 

[140] FIG. 17 is a flowchart illustrating operations performed in a method of processing a 
text caption according to an embodiment of the present invention. 

[141] Referring to FIG. 17, in order to reproduce data on a storage medium including 

moving picture data and subtitle data that is output as a graphic overlapping an image 
based on the moving picture data, first, subtitle data induing text data and control in- 
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formation is read from the storage medium in operation 1502. The real text data is 
deeodecl caption contents and output style information are parsed, and based on the 
parsed style information, caption contents are converted into pixel data in operation 
1504. The read control information is decode4 and time information and position in- 
formation to control a caption to be output in real time are parsed in operation 1506. 
According to the parsed time information and position information, the converted pixel 
data is output in real time in operation 1508. 

[142] The present invention can also be embodied as computer readable codes on a 
computer readable recording medium. The computer readable recording medium is 
any data storage device that can store data which can be thereafter read by a computer 
system. Examples of the computer readable recording medium include read-only 
memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tapes, floppy 
disks, optical data storage devices, and carrier waves (such as data transmission 
through the Internet). The computer readable recording medium can also be distributed 
over network coupled computer systems so that the computer readable code is stored 
and executed in a distributed fashion. 

[143] According to the present invention as described above, an information storage 
medium including text-based caption information to alleviate the discussed and/or 
other problems of the text-based caption technology and the subpicture-graphic-based 
caption technology used in a DVD, and a reproducing apparatus and a reproducing 
method thereof are provided. 

[144] Accordingly, management of a buffer becomes convenient, and captions in more 
than two different languages can be output at the same time, or only captions can be 
output separately without moving picture information. In addition, real time re- 
production of captions controlled by hardware can be guaranteed. 

[145] Furthermore, since the amount of encoded data of the subtitle data according to the 
present invention is relatively less than that of the conventional subpicture type caption 
data based on a bitmap image, address management of a VOBU is easier even when 
encoding is again performed in order to process multiple languages. 

[146] Although a few embodiments of the present invention have been shown and 

describee! it would be appreciated by those skilled in the art that changes may be made 
in these embodiments without departing from the principles and spirit of the invention, 
the scope of which is defined in the claims and their equivalents. 

Industrial Applicability 

[147] The present invention applies to reproduction of data on a storage medium, and 
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more particularly, to a storage medium containing text-based caption information 
compatible with the subpicture method of a digital versatile disc (DVD) and the pre- 
sentation method of a Blu-ray disc, and a reproducing apparatus and reproducing 
method thereof. 
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Claims 

[ 1 ] 1 . A storage medium comprising: 

moving picture data; and 

subtitle data to be output as a graphic overlapping an image based on the moving 
picture data, 

wherein the subtitle data comprises: 

text data to generate pixel (fata converted into a bitmap image, and 

control information to control the pixel (fata to be output in real time 
[2] 2. The storage medium of claim 1, wherein the text data generates the pixel data 

to be converted into the bitmap image such that caption contents are output as 

the graphic overlapping the image. 
[3] 3. The storage medium of claim 2, wherein the text data further comprises: 

style information to specify the style of the caption output as the graphic 

overlapping the image; 

wherein the style information comprises at least one of a pixel data area, a 
background color, a starting point at which a first letter of text begins, line 
spacing information, an output direction, a type of a font, font color, and a 
character code. 

[4] 4. The storage medium of claim 1, wherein the control information comprises: 

time information indicating a time at which the pixel (fata is generated in a buffer 
memory and a time at which the pixel (fata is deleted in the buffer memory; and 
position information recording a position at which the pixel data is output. 

[5] 5. The storage medium of claim 1, wherein the subtitle data comprises the text 

data corresponding to pixel data (PXD) contained in subpicture information and 
the control information corresponding to display control information 
(SP_DCSQT). 

[6] 6. The storage medium of claim 5, wherein the subtitle data is in a text format. 

[7] 7. The storage medium of claim 5, wherein the subtitle data is in a packet format. 

[8] 8. The storage medium of claim 1 , wherein the subtitle data comprises the text 

data corresponding to a presentation composition segment (PCS) contained in 
presentation data, and the control information corresponding to an object 
definition segment (ODS). 

[9] 9. The storage medium of claim 8, wherein the subtitle is data in a text format. 

[10] 10. The storage medium of claim 8, wherein the subtitle is data in a packet 
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format. 

1 1 . An apparatus to reproduce information from a storage medium comprising 
moving picture (fata and subtitle data to be output as a graphic overlapping on an 
image based on the moving picture data, the apparatus comprising: 

a text caption decoder to decode text data contained in the subtitle (fata and 
generate pixel (fata converted into a bitmap image, and decode and parse control 
information contained in the subtitle data to control a caption to be output in real 
time; and 

a graphic controller to control the pixel data to be output in real time using the 
control information. 

12. The apparatus of claim 11, wherein the text caption decoder comprises: 
a text caption parser to decode and parse the text data and the control in- 
formation; and 

a font renderer to convert the parsed text data into a bitmap image so that the 
parsed text is output as the graphic overlapping the image. 

13. The apparatus of claim 12, wherein the text caption parser decodes and 
parses style information from the text data and specifies an output style of the 
caption, and the font Tenderer converts the text data into the bitmap image 
reflecting the parsed style information. 

14. The apparatus of claim 12, wherein the text caption parser parses the text 
data and transfers the parsed text data to the font Tenderer. 

15. The apparatus of claim 12, wherein the text caption parser parses time in- 
formation indicating a time at which the pixel data is generated in a buffer 
memory and a time the pixel data is deleted in the buffer memory, and position 
information recording a position at which the pixel data is output, from the 
control information, and transfers the parsed information to the graphic 
controller; and 

the graphic controller controls the pixel data to be output in real time by using 
the parsed time information and position information. 

16. The apparatus of claim 11, wherein the subtitle data comprises: 

the text data corresponding to pixel data (PXD) contained in subpicture in- 
formation of a DVD formed by a bitmap image reproducing method; and 
the control information corresponding to display control information 
(SP_DCSQT). 

17. The apparatus of claim 16, wherein the text caption parser transfers the text 
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data to the font renderer, and the control information to the graphic controller; 
and 

the graphic controller controls the pixel data to be output in real time by using 
the transferred control information. 

18. The apparatus of claim 11, wherein the subtitle data comprises: 

the text data corresponding to a PCS contained in presentation data of a Blu-ray 
disc formed by a bitmap image reproducing method; and 
the control information corresponding to an ODS. 

19. The apparatus of claim 18, wherein the text caption parser transfers the text 
data to the font renderer and the control information to the graphic controller; 
and 

the graphic controller controls the pixel data to be output in real time by using 
the transferred control information. 

20. A method of reproducing information from a storage medium comprising 
moving picture data and subtitle data to be output as a graphic overlapping on an 
image based on the moving picture data, the method comprising: 

reading the subtitle data inducing text chta and control information from the 
storage medium; 

decoding the text data, parsing caption contents and output style information, and 
converting the caption contents into pixel data formed as a bitmap image based 
on the parsed style information; 

decoding the control information, parsing time information to control the pixel 
cfata to be output in real time, and parsing position information to control a 
position at which a caption is output; and 

outputting the converted pixel data in real time according to the parsed time in- 
formation and position information. 

21. A storage medium comprising subtitle information to display a caption, the 
subtitle information comprising: 

text data to generate pixel data converted into a bitmap image; and 
control information to control the pixel data to be output in real time. 

22. An apparatus to reproduce information from a storage medium having 
subtitle information, the apparatus comprising: 

a decoder to decode text data from the subtitle information and generate a bitmap 
image, and to decode and parse control information from the subtitle data to 
control a caption to be output in real time. 
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[23] 23. The apparatus of claim 22, further comprising a graphic controller to control 

the caption to be output in real time according to the control information. 

[24] 24. A method of reproducing subtitle information from a storage medium, the 

method comprising: 

decoding text data and control information from the subtitle information; 
parsing caption contents, output style information, time information, and position 
information to control a position at which the caption is output from the text cfata 
and control information; 

converting the caption contents into the caption based on the output style in- 
formation; and 

outputting the caption in real time according to the time information and position 
information. 

[25] 25. A text caption decoder of an apparatus to reproduce information from a 

storage medium comprising moving picture data and subtitle data, the text 
decoder comprising: 

a text caption parser to decode and parse text data and control information from 
the subtitle data; and 

a font renderer to convert the parsed text data into a bitmap image so that the 
parsed text data is output as a graphic overlapping an image based on the moving 
picture data. 

[26] 26. The text caption decoder of claim 25, wherein the text data as the graphic 

overlapping the image is output in real time. 
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